10 research outputs found

    ODA-based modeling for document analysis

    Get PDF
    This article proposes the document model of a hybrid knowledge-based document analysis system for business letters. The model combines requirements of object-oriented representation of both, documents as well as knowledge necessary for analysis tasks, and is based on the ODA platform. Model-driven document analysis increases the flexibility of a system because several analysis specialists can be used in co-operation to assist each other and to improve the results of analysis. The inherent modularity of the system allows for a reuse of knowledge sources and integral constituents of the architecture in other document classes such as forms or cheques

    Representation of non-convex time intervals and propagation of non-convex relations

    Get PDF
    For representing natural language expressions with temporal repetition the well known time interval calculus of Allen [Allen 83] is not adequat. The fundamental concept of this calculus is that of convex intervals which have no temporal gaps. However, natural language expressions like "every Summer" or "on each Monday" require the possibility of such temporal gaps. Therefore, we have developed a new calculus based on non-convex intervals and have defined a set of corresponding non-convex relations. The non-convex intervals are sets of convex intervals and contain temporal gaps. The non-convex relations are tripels: a first part for specifying the intended manner of the whole relation, a second part for defining relations between subintervals, and a third part for declaring relations of whole, convexified non-convex intervals. In the non-convex calculus the convex intervals and relations of Allen are also integrated as a special case. Additionally, we have elaborated and fully implemented a constraint propagation algorithm for the non-convex relations. In comparison with the convex case we get a more expressive calculus with same time complexity for propagation and only different by a constant factor

    Ein erwartungsgesteuerter Koordinator zur partiellen Textanalyse

    Get PDF
    In dieser Papier wird die koordinierende Komponente eines Systems zur erwartungsgesteuerten Textanalyse auf der eingeschränkten Domäne deutscher Geschäftsbriefdokumente vorgestellt: Dazu wurden wesentliche Konzepte und Datenstrukturen zur Modellierung der Domäne, das Nachrichtenmodell, entwickelt (siehe [Gores & Bleisinger 92]). Mit diesem Nachrichtenmodell steuert die Komponente die Textextraktion der Informationen eines vorliegenden Briefdokumentes. Sie wird in ihrer Arbeit von Spezialisten, sogenannten Substantiierern, unterstützt, die auf dem Text arbeiten. Dazu muß intensiver Nutzen von den Informationen eines Lexikons gemacht werden. Die Repräsentation des Ergebnisses erfolgt in einer Form, die eine weitere Verarbeitung, wie die semantische Interpretation und eine darauf aufbauende Generierung neuer Aktionen begünstigt

    Ein Modell zur Repräsentation von Nachrichtentypen

    Get PDF
    In diesem Papier stellen wir einen Formalismus vor, mit dem eine computergerechte Repräsentation verschiedener Klassen von Geschäftsbriefen möglich ist. Der Ausgangspunkt dieser Bemühungen ist das Projekt ALV (Automatisches Lesen und Verstehen), dessen Ziel das partielle Erkennen einer eingeschränkten Menge von Geschäftsbriefen ist. Für die verschiedenen Klassen von Geschäftsbriefen werden sogenannte Nachrichtentypen entwickelt, die sich aus einzelnen Bausteinen, den Nachrichtenelementen zusammensetzen. Diese werden durch eine modifizierte Conceptual Dependency-Notation definiert. Durch die hierarchische und modulare Definition kann eine breite Anzahl von Geschäftsbriefen modelliert werden. Die Zielrichtung des hier vorgestellten Modells liegt neben der effizienten Modellierung der Nachrichtentypen in der Bildbearbeitung durch ein Verfahren zur erwartungsgesteuerten Textanalyse in ALV. Um dies zu ermöglichen, wurden zahlreiche Steuerungselemente in das Nachrichtenmodell aufgenommen

    HYPERBIS : ein betriebliches Hypermedia-Informationssystem

    Get PDF
    Hypermediasysteme finden in jüngster Zeit immer größere Beachtung, was sich in vielen Konferenz- und Workshopveranstaltungen niederschlägt. In diesem Bericht wird die Entwicklung eines betrieb-lichen Informationssystems unter Verwendung eines Hypermediasystems betrachtet. Die verfolgte Absicht dieses Ansatzes war es, möglichst viele Informationen des DFKI, insbesondere der bestehenden Organisation, des beschäftigten Personals, der durchgeführten Projekte und der benutzten Räumlichkeiten, in einheitlicher Weise auf einem Rechner zu verwalten und bei unterschiedlichen Gelegenheiten wirkungsvoll zu präsentieren. Das System HYPERBIS wird einerseits aus entwicklungstechnischer Sicht und andererseits aus Benutzersicht beschrieben. Zum einen werden die teilweise schwierige Akquisition und Analyse von Informationen über das DFKI sowie die anschließende Abbildung in die Hypermediastrukturen diskutiert. Zum anderen werden ausführlich die komfortable Benutzerschnittstelle und die hilfreichen Wartungsfunktionen erklärt

    Theoretical consideration of goal recognition aspects for understanding information in business letters

    Get PDF
    Businesses are drowning in information - paper forms, e-mail, phone-calls and other media do struggle the speed of managers in handling and processing information. Traditional computer systems do not support business flow because of their inflexibility and their lack in understanding information. A sophisticated understanding of the meaning of a business letter requires an understanding of why the sender wrote it. This paper describes some ideas to use goal recognition techniques as one possibility, or method to initiate information understanding. It brings together two areas of cognition: goal recognition and document understanding. To do so, it gives an overview of the application of goal recognition techniques to the discovery of the overall purpose of a letter and a coherent explanation of how the individual sentences are meant to achieve that purpose

    Text skimming as a part in paper document understanding

    Get PDF
    In our document understanding project ALV we analyse incoming paper mail in the domain of single-sided German business letters. These letters are scanned and after several analysis steps the text is recognized. The result may contain gaps, word alternatives, and even illegal words. The subject of this paper is the subsequent phase which concerns the extraction of important information predefined in our "message type model". An expectation driven partial text skimming analysis is proposed focussing on the kernel module, the so-called "predictor". In contrast to traditional text skimming the following aspects are important in our approach. Basically, the input data are fragmentary texts. Rather than having one text analysis module ("substantiator") only, our predictor controls a set of different and partially alternative substantiators. With respect to the usually proposed three working phases of a predictor - start, discrimination, and instantiation - the following differences are remarkable. The starting problem of text skimming is solved by applying specialized substantiators for classifying a business letter into message types. In order to select appropriate expectations within the message type hypotheses a twofold discrimination is performed. A coarse discrimination reduces the number of message type alternatives, and a fine discrimination chooses one expectation within one or a few previously selected message types. According to the expectation selected substantiators are activated. Several rules are applied both for the verification of the substantiator results and for error recovery if the results are insufficient

    Text skimming as a part in paper document understanding

    Get PDF
    In our document understanding project ALV we analyse incoming paper mail in the domain of single-sided German business letters. These letters are scanned and after several analysis steps the text is recognized. The result may contain gaps, word alternatives, and even illegal words. The subject of this paper is the subsequent phase which concerns the extraction of important information predefined in our "message type model". An expectation driven partial text skimming analysis is proposed focussing on the kernel module, the so-called "predictor". In contrast to traditional text skimming the following aspects are important in our approach. Basically, the input data are fragmentary texts. Rather than having one text analysis module ("substantiator") only, our predictor controls a set of different and partially alternative substantiators. With respect to the usually proposed three working phases of a predictor - start, discrimination, and instantiation - the following differences are remarkable. The starting problem of text skimming is solved by applying specialized substantiators for classifying a business letter into message types. In order to select appropriate expectations within the message type hypotheses a twofold discrimination is performed. A coarse discrimination reduces the number of message type alternatives, and a fine discrimination chooses one expectation within one or a few previously selected message types. According to the expectation selected substantiators are activated. Several rules are applied both for the verification of the substantiator results and for error recovery if the results are insufficient

    Pi_{ODA} : the paper interface to ODA

    No full text
    In the past, many people have proclaimed the vision of the paperless office, but today offices consume more paper documents than ever before. As computer technology becomes more and more important in daily practice of modern offices, intelligent systems bridging the gap between printed documents and electronic ones, called paper-computer-interfaces, are required. In this report our model-based document analysis system Pi_{ODA} is discussed in detail. Basic ideas of the ODA standard for electronic representation of office documents are the foundation of our document model. Moreover, different knowledge sources essential for the analysis of business letters are incorporated into the Pi_{ODA} model. The system comprises all important analysis tasks. Initially, layout extraction includes a necessary low-level image processing and segmentation to investigate the layout structure of a given document. While logical labeling identifies the logical structure of a business letter, text recognition explores the captured text of logical objects in an expectation-driven manner. By this way, word hypotheses are generated and verified using a dictionary. Finally, a partial text analysis component syntactically checks well-structured text objects, primarily the recipient of a letter. As output, Pi_{ODA} produces an ODA conforming symbolic representation of a document originally being captured on paper. Now, the document is available for any further automatic processing such as filing, retrieval or distribution. The inherent modularity of our system, however, allows a reuse of knowledge sources and constituents of the architecture in other document classes such as forms or cheques. Additionally, Pi_{ODA} is an open and flexible system: improved and new analysis methods can be integrated easy without modifying the overall system architecture

    An approach to integrated office document processing & management

    No full text
    We propose an approach towards an integrated document processing and management system that has the intention to capture essentially freely structured documents, like those typically used in the office domain. The document analysis system ANASTASIL is capable to reveal the structure as well as the contents of complex paper documents. Moreover, it facilitates the handling of the containing information. Analyzed documents are stored in the management system KRISYS that is connected to several different subsequent services. The described system can be considered as an ideal extension of the human clerk, making his tasks in information processing easier. The symbolic representation of the analysis results allow an easy transformation in a given international standard, e.g., ODA/ODIF or SGML, and to interchange it via global network
    corecore